stochastic ensemble value expansion
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
There is growing interest in combining model-free and model-based approaches in reinforcement learning with the goal of achieving the high performance of model-free algorithms with low sample complexity. This is difficult because an imperfect dynamics model can degrade the performance of the learning algorithm, and in sufficiently complex environments, the dynamics model will always be imperfect. As a result, a key challenge is to combine model-based approaches with model-free learning in such a way that errors in the model do not degrade performance. We propose stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue. By dynamically interpolating between model rollouts of various horizon lengths, STEVE ensures that the model is only utilized when doing so does not introduce significant errors. Our approach outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency.
Reviews: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
The main algorithmic idea is a weighted combination of H step temporal differences, estimated on H steps (and rolled out by a learned model of the environment). The underlying idea is to allow the learner to tradeoff between estimation errors in model and Q function in different parts of the state-action space during learning. The updated TD estimator is incorporated into the DDPG algorithm in a straightforward manner. The update is computationally more intensive but the result is improved sample complexity. The experimental results on a variety of continuous control tasks show significant improvement over the baseline DDPG and a related method (MVE) (which is the precursor to this work). Overall, the paper is well written. The empirical results are very promising. The analysis and discussion is a bit limited but is not a major drawback. Overall, there is much to like about the paper.
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Buckman, Jacob, Hafner, Danijar, Tucker, George, Brevdo, Eugene, Lee, Honglak
There is growing interest in combining model-free and model-based approaches in reinforcement learning with the goal of achieving the high performance of model-free algorithms with low sample complexity. This is difficult because an imperfect dynamics model can degrade the performance of the learning algorithm, and in sufficiently complex environments, the dynamics model will always be imperfect. As a result, a key challenge is to combine model-based approaches with model-free learning in such a way that errors in the model do not degrade performance. We propose stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue. By dynamically interpolating between model rollouts of various horizon lengths, STEVE ensures that the model is only utilized when doing so does not introduce significant errors.